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The ability of a protein to recognise multiple indepen- 
dent target conformations was demonstrated in [Q. Here we 
consider the recognition of correlated configurations, which 
we apply to funnel design for a single conformation. The 
maximum basin of attraction, as parametrised in our model, 
depends on the number of amino acid species as In^, inde- 
pendent of protein length. We argue that the extent to which 
the protein energy landscape can be manipulated is fixed, ef- 
fecting a trade-off between well breadth, well depth and well 
number. This clarifies the scope and limits of protein and 
heteropolymer function. 
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It is believed that a stable, fast folding protein requires 
an energy landscape in which the native conformation is 
both a deep global minimum and lies at the bottom of 
a basin of attraction sloping towards it |^ . These condi- 
tions are known as thermodynamic stability and kinetic 
accessibility, respectively. While stability may be readily 
achieved by suppressing the energy of the sequence ar- 
ranged in the target conformation, constructing a broad 
funnel leading towards the target has remained elusive. 

The first satisfactory method of protein design, intro- 
duced by Shakhnovich in 1994 relies on the correla- 
tion between stability and accessibility: stable sequences 
are found to fold more quickly as well. Minimising the en- 
ergy or relative energy over sequence space while the con- 
formation remains quenched to the target yields pro- 
tein sequences which fold much more rapidly than ran- 
dom heteropolymers of equal length. We have provided 
evidence, nonetheless, that the most stable sequences are 
not the fastest folding, and that a reduction in stability 
allows significant gain in efficiency Q . 

In this Letter we investigate the introduction of a fold- 
ing funnel above the target conformation in the protein 
energy landscape Our method of design relies on 
the technique of training to multiple targets discussed 
in [01 . Unlike the independent conformations previously 
considered, here our patterns are correlated to a single 
target conformation. We find that the extent of the op- 
timal folding funnel, as parameterised by our model, is 
smaller than the conformational space and depends on 
the number of amino acid species available. The influence 
of alphabet size on the folding performance of untrained 
sequences is considered in Q]. 

Our approach to funnel design is to turn off all the 
monomer interactions (equivalent to an interacting sys- 
tem at infinite temperature) and to consider the dynam- 
ics by which a protein would then spontaneously unfold 



from the target state into a random ensemble. By the 
principle of detailed balance in equilibrium statistical me- 
chanics, the ensemble of unfolding trajectories from the 
target state to random conformations is equivalent to the 
ensemble of folding trajectories from random configura- 
tions to the target — but of course the former ensemble 
is much more easily sampled. Therefore, observations of 
unfolding will tell us how the molecule would with least 
dynamical constraint fold. 

We provide estimates of the unfolding contact map 
based on a blob model of unfolding. This is motivated by 
thermodynamic tractability and its basis in established 
polymer physics, despite its at times unrealistic repre- 
sentation of kinetics. It leads to a definite proposal as to 
how different stages in the unfolding contact map should 
be weighted in training so as to create an optimal funnel. 

We find, however, that training to the ideal folding 
funnel cannot be achieved. Remarkably, the bound on 
funnel size (in terms of a relaxation length scale) is iden- 
tical to the thermodynamic capacity derived in |]| . Taken 
together, our results suggest that the extent to which the 
protein energy landscape can be manipulated — whether 
it be by the introduction of multiple independent minima, 
well depth or well breadth (or a combination thereof) — 
is limited and proportional to the log of the number of 
amino acid species. 

Generalisation to Weighted Training In a sepa- 
rate Letter |0] we investigated the design of multi-stable 
proteins by training to a uniform superposition of contact 
maps. The typical well depth of a protein of length N 
embedded in one of the target conformations was found 
to be 

~ - y^iVCT VhTA. ( 1 ) 

where A is the number of amino acid species, u is 
the standard deviation of the interaction energies and 
z' ~ z — 2 is the effective coordination number, i.e., 
the maximum number of local contacts excluding the 
backbone. After training to a weighted superposition of 
contact maps, we expect conformations associated with 
higher weights to have deeper wells. The derivation of 
the precise dependence follows. 

The total contact map is defined by summing over the 
individual maps with suitable weights, 

p 

Ctot., =5]w;mC^.,, (2) 
,1=1 

where u;^ is the weight associated with conformation 
Pp. The minimum Hamiltonian associated with the total 
weighted contact map is 
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where U* minimises i?tot- 

By analogy with calculations in we re-express (H) 
as a sum over -fftoti, each minimised by the choice of Si, 
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where fftoti is the sum over connections to monomer i, 
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The local Hamiltonian i?toti is simply a weighted sum of 
the independent local conformational energies, 
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Proceeding as in jl], we approximate the distribution of 
^^toti by its central limit form; it is a gaussian with vari- 
ance cr^Q^. = ycr^ X]^=i ^fi- This estimation is vahd out 

to \Htou \ ~ T^Er,=i^«P- 

We now consider iftot; in @ as a sum of two terms, 

p 

Since i/^^. and i^othi cire independently gaussianly dis- 
tributed with variances 
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the distribution of i/^^ for fixed + i?othi = H\ 
reduces to 
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where c is a normalising constant and ct^qj . = cr^. -t- ct^jj^ . 



The value of H/j, . of maximum likelihood from is given 
by 
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The minimum local Hamiltonian corresponds to the 
smallest of A samples from the distribution of 7?^^^. . . We 
approximate the minimum of A samples from a gaussian 
of zero mean and standard deviation (Ttoti by [Q| 



H^^ ~V2<Jt,t,Vh^. (12) 
Substituting (|l2|) into ( |ll] ) and summing over i yields 
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This establishes how the minimised Hamiltonian dis- 
tributes over the individual weighted configurations; for 
the special case of equal weights it duly reduces to (^. 

Blob Model of Unfolding It is a well known trend 
in polymer physics that the larger scale features of molec- 
ular conformations have systematically longer relaxation 
times. For example, for non-interacting chains with sim- 
ple kink-jump dynamics, a subsection of g monomer units 
has relaxation time T{g) proportional to On this 

basis we assume that after time t, a spontaneously un- 
folding polymer will have equilibrated locally up to scale 
g, such that T{g) = t, but still reflect the folded confor- 
mation on larger scales. 

This blob view of proteins, that time scales relate uni- 
formly to length scales, is of course a particular and sim- 
plified outlook, motivated by its tractability. Compli- 
cations which we do not address here include spatially 
localised nucleation events and specific configurational 
bottlenecks. Nevertheless, it allows us to make some 
quantitative predictions about the limits of the basin of 
attraction, which has long proved to be evasive. 

The folded protein, which we assume to be compact 
and associate with g = 1, consists of N single monomer 
blobs. The contact map C(l) has z' non-zero entries in 
each row and column, z'N non-zero entries in total. 

For the state unfolded up to length scale g, the protein 
may be thought of as a chain of y blobs, folded to its 
coarse grained original conformation. Accordingly, the 
contact map C{g) has y intra-blob blocks along the diag- 
onal and ^-y- inter-blob blocks corresponding to nearest 
neighbour blobs (not along the backbone). Scaling the- 
ories for polymer configurations with excluded volume 
would imply that the average total number of contacts 
between two neighbouring blobs be of order unity. Aver- 
aging over an ensemble of conformations at constant g, 
this requires that each of the g^ entries for each blob be 
of order 

The total number of conformations (compact or oth- 
erwise) available to a protein grows as ^ ^ (not 
to be confused with k ~ 1.85 for compact structures 
only); this becomes k n for a chain of y blobs. Since 
the product of the internal and external conformational 
freedoms of a partially relaxed protein must equal , 
a protein relaxed to length scale g can be estimated to 
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take on k^^~~^ configurations. It follows that the en- 
tropy gained in folding from a denatured configuration 
down to a conformation relaxed to length scale g is 
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Training to a Funnel While an energy minimum 
significantly below the minimum copolymer energy en- 
sures thermodynamic stability of the target conforma- 
tion, rapid convergence necessitates a funnel of kinetic 
pathways sloping towards the target. The widest possi- 
ble funnel is that which least constrains the dynamics, 
which we propose is given by the conformations sampled 
in unfolding via the blob model. We thus consider com- 
bining the contact maps from different times (and values 
of g) of a noninteracting, spontaneously unfolding com- 
pact conformation with weights w(g), 



In AT 



C'totij — / ^ 
In 3=1 



(15) 



The minimum Hamiltonian associated with the total con- 
tact map then appears as 
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analogous to (||). The total Hamiltonian associated with 
monomer i is the sum of the individual local Hamiltoni- 
ans evaluated at different values of g, 
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where H{g) = w{g)E{g). In accordance with our pre- 
vious calculation, we require cr4>t ■ We first estimate 
the variance in the choice of H{g) available to a single 
monomer as 



2 z'g f w{g) y 2 
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where ^ is the number of contacts available to a given 
monomer equilibrated to scale g and ^^^^ is the overall 
weighting for each one. The variance of the local energy 
per monomer integrated over all g is then 
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Again we wish to establish how the minimised Hamilto- 
nian distributes over weighted configurations unfolded to 
length scale g. Applying the general result (p^ yields 
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Substituting (|lj) and ( |l8| ) into (|2^) and summing over i, 
the minimum energy associated with matching the con- 
formation at scale g can then be estimated as 



E'^'^'ig) ~ -Na'^^/h^A 
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In order that the training reverse the unfolding dynam- 
ics, the required funnel must have sufficient slope, that is, 
F{g) — E{g) — TS{g) < 0. Equating the two expressions 
Tx (ll|) and (|ll) gives 
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and thus w{g) oc g^ . Unfortunately this form for w is in- 
consistent with a convergent (iV-independent) evaluation 
of (Ttoti in (|l^) • Our assumption that the training energy 
could reverse the unfolding dynamics does not hold for 
all values of g. 

We consequently introduce the cutoff scale gmax, up 
to which our funnel extends. Substituting ( p^ into ( p^ ) 
and reducing the domain of integration yields 
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from which it follows that 
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The width of our funnel, as parametrised by .gmax 
above, increases strongly as folding temperature T de- 
creases. At too low a temperature, however, the coil will 
collapse as a random copolymer into what we presume to 
be a glassy state. The loss in entropy resulting from col- 
lapse will be equivalent to — (^4|) evaluated at <? = 1 (the 
collapsed copolymer will be fully folded). The modest 
decrease in energy afforded by the minimum copolymer 
energy can overcome this entropic loss only at low tem- 
perature Tcp- Equating the minimum copolymer energy 

from (7) in and Tcp times the loss in entropy 

l3=i leads to 
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and hence at T ~ T, 
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which is identical to the form of Pmax derived in [Q . 

Discussion of Capacity That the bound on the fold- 
ing funnel (/max is less than N implies the extent of the 
achievable folding funnel is less than the conformational 
space of the protein. Folding at finite temperature can- 
not be made as direct as unfolding at infinite tempera- 
ture. The cutoff 5max is the length scale of the struc- 
ture below which the energy landscape corresponding to 
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the trained sequence is characterised by a funnel. Above 
ffmax, the protein must organise itself into the desired 
(coarse grained) conformation without the help of kinetic 
guidance, that is, it must traverse an effective copolymer 
landscape (Figure |l|). What happens to the protein en- 
ergy landscape upon increasing the width of the funnel? 
As 

9 ~^ ffmax, the slope of the funnel becomes sufhciently 
shallow such that, at g = ^max, the decrease in energy no 
longer overcomes the loss of entropy (Figure |^) ; the well 
ceases to be a free energy minimum. 

Consider the protein as a sequence of A^/^max blobs, 
each of size 5max- The benefit of the funnel is realised 
once the chain of blobs folds to its coarse grained tar- 
get state. Assuming this statistical bottleneck to be the 
rate determining step, the time necessary for the protein 
to fold is reduced by the factor /{"(i^i/S"*")^, which is 
significant even for small values of (/max- 

Manipulation of the Energy Landscape In both 
the thermodynamic 0] and kinetic contexts, the extent to 
which the protein energy landscape can be manipulated 
is limited by where A is the number of amino acid 
species and k is the compact conformational freedom per 
monomer. Like squeezing one end of a balloon at the 
expense of inflating the other, further deformation of the 
energy landscape is counter-balanced by its relaxation 
elsewhere. 

The agreement between the bounds on protein mem- 
ory, on the one hand, and the basin of attraction, on 
the other, was unexpected. Taken together, these results 
suggest that the engineering of proteins and heteropoly- 
mers is constrained by a fixed budget. The finite freedom 
in the sequence can be invested in various attributes: in 
well number, well breadth and well depth. A reduction 
in expenditure in one allows increased investment in an- 
other. 

In particular, our results suggest that thermodynamic 
stability and kinetic accessibility, while correlated over a 
significant region, are in conflict near the extremes of ei- 
ther; maximally stable sequences are not the fastest fold- 
ing and the fastest folders are not the most stable. (We 
presented preliminary evidence to this end in [^). Ac- 
cordingly, thermodynamically oriented sequence design 
need not select for the fastest folding proteins and a re- 
duction in stability admits increased accessibility. If Na- 
ture has designed proteins to fold as quickly as possible, 
we would expect only marginal stability in the native 
conformation. The preceding premise might be estab- 
lished by observation of normal and mutated naturally 
occurring proteins. 

Notably, the bound on manipulating the energy land- 
scape is independent of protein length; the diversity of 
protein function grows with alphabet size only. The large 
(relative to k) amino acid alphabet found in Nature is 
crucial to the variety of protein function within the cell 
or in multicellular organisms. To the extent that het- 
eropolymer models are intended to provide insight into 
proteins, their alphabet sizes should reflect this. Ele- 
mentary representations, such as frequently studied H-P 



models, are not able to effect the thermodynamic and 
kinetic diversity possible with larger alphabets. 

Perhaps most interesting is the increased scope for pro- 
tein and heteropolymer function. The discovery that pri- 
ons fold to multiple conformations jl^ has extended our 
notion of heteropolymer behaviour beyond familiar pro- 
tein collapse. We have presented arguments that the 
energy landscape may, within limits, be tailored to ef- 
fect function heretofore unobserved. Further discovery 
of novel protein mechanisms should prove fascinating. 




FIG. 1. Folding in the presence of a funnel. The denatured 
protein wanders through conformation space until it matches 
the target structure coarse-grained to length scale (/max, af- 
ter which the funnel quickly guides the protein towards the 
target. 
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FIG. 2. Energy landscapes of sequences trained to have 
increasingly broad funnels. Maximising stability (top) corre- 
sponds to a deep, narrow well. As the length scale g to which 
the funnel extends increases, the depth of the target well is 
reduced; at <; = ffmax, the slope of the funnel is no longer 
sufficient to provide a free energy minimum (bottom). 
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